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Abstract. Surrogate models are used for global approximation of responses generated 
by expensive computer experiments like CFD applications. In this paper, we make use 
of structural similarities which are shared by a class of related problems. We identify 

t—5 these structures by applying statistical shape models. They are used to build a generic 

surrogate model approximation to sample data of a new problem of the same class. In 
a variable fidelity framework the generic surrogate model is combined with the sample 

^ data to generate an efficient and globally accurate interpolation model, which requires 

less costly sample evaluations than ordinary response surface methods. We demonstrate 
our method with an aerodynamic test case and show that it significantly improves the 
approximation quality. 

1 Introduction 

(N 

t> 

In multidisciplinary numerical simulation and optimization, surrogate modeling has 
gained popularity during the last two decades. Numerical computation of realistic mod- 
{N| els still is challenging and computationally intensive. When the global behavior of an 

input-output relationship of a computer experiment is sought, dense evaluations over the 
whole input parameter space are out of reach. Surrogate models or also called metamod- 
els or response surface models approximate or interpolate the output of a computer code 



based on a moderate number of evaluations. Typically, the evaluation of the computer 
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experiment dominates the overall computational cost of model generation. For the sake 
of efficiency, methods have to be designed to use as few samples as possible without 
sacrificing accuracy. Radial basis functions and the Kriging method are widely used be- 
cause of their flexibility and ability to interpolate highly nonlinear functions. Improving 
these methods is a major topic of ongoing research: On the one hand, adaptive sampling 
strategies are investigated to reduce the number of required evaluations jH] , [37] . On the 
other hand, additional information which is assumed to be correlated with the response 
is used to improve the accuracy of approximation, like gradient enhanced Kriging (GEK) 
|ZZ|, 133,123, Cokriging [12], US], US and variable fidelity modeling (VFM) [UJ,[32]. 

All these approaches treat the response itself more or less as an unknown output of 
a black-box. This paper is motivated by the assumption that for a predefined problem 
class, the behavior of the response is not arbitrary, but rather related to other instances 
of the mutual problem class. For example in computational fluid dynamics (CFD), 
responses of aerodynamic coefficients, depending on the input parameters Mach num- 
ber and angle of attack, share structural similarities for different airfoil geometries. To 
identify these structures we make use of the concept of statistical shape models, which 
use a principal component analysis to quantify modes of variation of a previously com- 
puted training database. If functions of a problem class are accessible in form of such 
a database, a new test case of this function family can be approximated using only few 
evaluations. Instead of directly interpolating the samples, first the principal components 
are fitted to the data which then act as a generic surrogate model (GSM). Based on this 
model, interpolation of the samples is performed in a VFM framework. 

The statistical interpolation method Kriging has been widely used in geostatistics since 
the 1950s [22j,[31j, it was introduced in surrogate modeling for computer experiments in 
the eighties [38] and is nowadays applied to CFD simulation and optimization, see e.g. 
|27|.|24].|14|. A survey about sampling strategies in optimization is given in |41j . while 
adaptive sampling strategies for global approximation can be found in [5].|27j.[7].[T5]. 
The authors in [20] present an early review and a recent one is given in [37J. Another 
framework of improving the approximation quality uses secondary information about the 
response. Gradient enhanced Kriging (GEK) incorporates derivatives [33], [21], which are 
often available by adjoint computations. This approach is used in various fields of aero- 
dynamic applications such as optimization [5] , [29J , [26J , [47] , uncertainty quantification 
[5] , [30] and global approximation [27] , [37] . GEK can be interpreted as a special form of 
Cokriging. This method also originates from geostatistics [46J and enables the incorpo- 
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ration of any auxiliary variable which is correlated with the primary one. In the last 
few years, it was also applied in aerodynamics and engineering |13j,[I2j,[19], where the 
approximation of few costly computations of high fidelity could be improved by a large 
number of evaluations of a cheaper low fidelity model. Other variable fidelity modeling 
(VFM) methods use bridge functions to correct the discrepancy between data of low 
and high fidelity and date back almost as long as surrogate modeling for computer ex- 
periments itself, e.g. [3]. Nowadays they are also used in aerodynamics pH] . |36j . [T?j . [28j . 
Recently a new VFM method was developed |16j . which overcomes the difficulties of 
model building and robustness in Cokriging as well as the problems of accuracy and 
missing mean squared error prediction in the correction based VFM methods. While 
statistical shape models are popular in the fields of computer vision and (medical) image 
processing [6] , [25] , [8] , to our knowledge this paper represents the first attempt to use this 
technique in response surface methods for global approximation of expensive computer 
experiments. 

The outline is as follows. Section [2] presents Kriging as a surrogate modeling frame- 
work for computer experiments. A database of surrogates for a predefined problem class 
from aerodynamic simulation is considered in section [3] and the problem of establishing 
a correspondence between the database elements is discussed. The concept of statistical 
shape models as well as the POD method for L 2 -functions are presented in section [4j In 
section [5j we describe the gappy POD method, extend it to the continuous case and ex- 
plain how the gappy POD fit of the POD basis elements to some sample data establishes 
a generic surrogate model. We discuss how the generic surrogate model approximation 
and the sample data are combined by a variable fidelity modeling interpolation in section 
[6j In the final section, we present numerical results of the generic surrogate modeling 
technique for responses of aerodynamic coefficients with a database of several airfoil 
geometries and compare them to common Kriging. 



2 Surrogate modeling 



We consider a computer experiment like a CFD solver, whose evaluation is expensive. 
It is treated as a black-box, depending on input parameters x E R rf and we observe 
a scalar response y(x). We want to approximate the unknown function y : M d — > R 
in a domain of interest Q C M d by a surrogate model y(x) which can be evaluated at 
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low computational cost. The Kriging method is a statistical method of interpolating n 
given data pairs {(xi,y(xi))}^_ 1 of evaluations of the unknown function y(x). In this 
section we only give a brief review, following [38] and |21j . For more information about 
constructing the surrogate we refer to [33] , [2] , [315] . For the interpolation model, the 
deterministic response is treated as a sum of a linear regression part and a "lack of fit" 
term 

K 

y(x) = J2Pkfk{x) + z{x), xeflcR d , (1) 
fc=i 

where fk(x) : M rf — > K are known functions with coefficients /3& S M and z(cc) is the 
realization of a stationary Gaussian process, which captures the nonlinear behavior of 
the response. The regression part usually consists of low order polynomials (simplest 
case: K=l, f\ = l) and the Gaussian process is assumed to satisfy 

E[z(x)} = 0, Cov[z(x),z(w)] = a 2 R(w-x) Vx,weQ, (2) 

where a 2 is the process variance and R(w — x) is a spatial correlation function, which 
will be explained further at the end of this section. 

The Kriging predictor 

y(x) := c(x) T Y (3) 

is a weighted sum of the given response evaluations Y := (y(x\), . . . , y(x n )) T . With the 
weights Ci(x) chosen as the solution of the optimization problem 



min MSER/O)] = E[(c(x) T Y - y(x)) 2 ] 
c(x)em n 

s.t. E[c(x) T Y - y(x)\ = 0, 



(4) 



it is also a best linear unbiased estimator (BLUE). The solution of Q is given by the 
solution of the linear equation 



R F 
F T 





(5) 



where R = [R(xi, Xj)] i j £ ]R nxn denotes the positive definite and symmetric correlation 
matrix, r(x) = (R(x,Xi)) i € M n contains the correlations between x and every Xj. 
F = [/fc(^i)]j k e ^ nxK is the linear regression design matrix and f(x) = (fk{x)) k 6 R K 
contains the evaluations of the design functions in x. fJ,(x) G M. K are the Lagrange 
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multipliers for the unbiasedness condition in Q. 

Using ([5]), a closed term for the Kriging estimator ^ can now easily be derived: 



y(x) = c(x) T Y 



r(x) 



T 



R F 

F T 



(6) 



Inserting any Xi {i = 1, . . . , n), it becomes obvious that the Kriging predictor is in fact 
an interpolator (y(xi) = y(xi)). Solving the linear equation independently from x in ^ 
only once for a given dataset, the Kriging predictor can be evaluated efficiently at the 
cost of a dot product of size n + K, ending up with a cheaply accessible surrogate model 
for y(x). 

The spatial correlation function ([2]) is modeled as a product of one dimensional cor- 
relation functions 

d 

R(w-x,9) = Y[R< k \\wW -xW\,6W), (7) 
k=i 

still depending on so-called hyperparameters 9 = (9^\ ■ ■ ■ , 0^) which determine the 
correlation lengths. Popular choices for the correlation function are exponential func- 
tions of the type R( h \\w^ — x^\, 6^) = exp {— 9^\w^ — p £ [1, 2], or cubic 
splines, all satisfying R(0) = 1 and R(\w — x\) > 0. The hyperparameters 6^ 

\w—x\—toc 

are usually determined by solving a maximum likelihood problem 

max (2n)-%a- n (detR{6))- 1 2exp\--^(Y-F(3) T R(6)- 1 (Y-F(3)X . (8) 

R{6) again denotes the correlation matrix, now depending on 9. It is possible to cancel 
out j3 and a 2 by necessary first order conditions 

p(o) = (f t r(9)- 1 fY 1 F T R{6)- 1 Y (9) 



n 

and the problem can be reduced to 

min 



a 2 (9, (3(9)) = V - FmVm^iY - Ff3(9)) (10) 



{a 2 (9,P(9))(detR(9)^Y (11) 
However, when solving the maximum likelihood problem with a gradient based algo- 
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rithm, formulation (11) can cause difficulties due to highly nonlinear implicit dependen- 
cies and it is advisable to rather use pi) |37j . 



3 Function database for aerodynamic simulation 



We now concretize our test case for better understanding without loss of generality. Our 
goal is to generate surrogate models for scalar aerodynamic coefficients lift (q), drag 
{a,]) and pitching moment (c m ) for two-dimensional airfoil geometries, which depend 
on the input parameters Mach number (Ma) and angle of attack (a). For every input 
parameter configuration, the response can be evaluated by running a simulation with 
a (computationally intensive) CFD solver. We assume that the responses of different 
airfoils constitute a mutual problem class and share structural similarities. When already 
having generated surrogate models for several airfoil geometries, theses similarities can 
be used to improve the quality of a surrogate model for a new instance of this problem 
class. 

We consider a database of surrogate models yi(x), . . . ,y m (x) for the same problem 
class and we omit the superscript for simplicity (yi := %). In our case, each yi 6 C 1 (f2) is 
the response of an aerodynamic coefficient (e.g. lift) depending on the input parameters 
x = (Ma, a) for a particular airfoil geometry. So the database consists of previously 
computed response surface functions corresponding to m different airfoils. Furthermore 
we assume each response function to have a sufficient accuracy, i.e. it represents a globally 
valid surrogate model in the domain of interest Jlcl 2 . 

In the fields of computer vision and image processing, statistical shape models built 
from a dataset of examples have been widely used [8]. After establishing correspondence 
between the database functions by admissible transformations, also called alignment, 
a principal component analysis is performed to identify the most important modes of 
variation. Often Euclidean or similarity transformations are used to establish correspon- 
dence between the images, e.g. see [BJ. For an overview on image registration techniques 
and possible transformations see [TT] . |43j . |48j . 

As an admissible transformation we define the following one, which is an affine trans- 
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formation in each dimension of Q C M 2 and in the image space y(f2) C M: 

(12) 



,(i) ( i + + 

Vj {x(q j ),q j ) ■= yj {x(q j )) (l + <?I) + <i (13) 



parametrized by q 3 E K 6 (j = 1, . . . , m). Note that the transformation is chosen such 
that Ijj (x(0),0) = yj(x) for q 3 = 0. We emphasize that this transformation was found 
to be suitable for our test cases, other problem classes could require other admissible 
transformations. One function yi{x) is defined as a reference, meaning no transformation 
is applied (q 1 := 0). For the other transformation parameters q 2 , . . . , q rn an optimization 
problem with 6(m — 1) unknowns has to be solved: 



mm 

q 2 ,...,q n 



1 rn f 2 A — 

V ' 3=1 k>j JQ 3=2 



The solution minimizes the overall sum of squared differences, while for robustness a 
penalty term is included which guarantees that the transformation does not become too 
large. This nonlinear least squares problem is solved by a GauB-Newton algorithm |35j . 



Some computational issues will now be addressed briefly. The integral in (14) must 
be approximated by a numerical quadrature 

N 

/ f(x)dxx>y2wif(xi), (15) 
Jn i=1 



e.g. with the Xi as elements of a viV x v iV-tensorgrid and positive weights Wi. With 
the term 

e%j,k ■= Vwl (yj(x(xi,q 3 ),q 3 ) -y~k( x ~{x i) q k ) ) q k )^ , (16) 

A r m(m — 1) 

e G i 2 ; (17) 



we can write the first summand of ( 14 ) in discretized form as a sum of squared differences 

m N 

v ' fc=i ■ 



j>fc i=i 
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Then using 



J:=^M G M^^x^ (19) 
oq 



the Gaufi-Newton algorithm needs the gradient and an approximation to the Hessian 

V q SSD(q) = J T e G M 6m (20) 
V 2 q SSD(q) w J T Jel 6mx6m . (21) 

Each entry of the symmetric matrix J T J is a dot product of length _/y m (™~ 1 ) anc i there 
are 6m ( 6 ^ n+1 ) entries to compute, so the algorithm can really benefit from parallelization. 
Furthermore, the memory usage by storing the matrix J is 0(Nm 3 ). 



4 Proper orthogonal decomposition 

For the dataset (yi,...,y m ), yi G L 2 (J7;IR), we want to perform a proper orthogo- 
nal decomposition (POD) which is also called principal component analysis (PCA) or 
Karhunen-Loeve transformation, see also pQ or |42j . Again, we omit the superscripts for 



simplicity (yi := y i: see (|3j) , ( 13 ) ) . We keep in mind that in this paper, all functions are 
surrogate models which have been aligned by admissible transformations, but the POD 
method applies for any yi G L 2 (ft) as well. In PCA significant structures of the dataset 
are identified which is realized by an orthogonal decomposition of the covariance matrix 



C (y) .-_ 



..e 8 ™ (22) 



We point out that classically mean centered functions are considered, i.e. the proper 
orthogonal decomposition is performed on (yi, . . .,y m ), m := yi — ~ Y?k=i i 8 J- Tnis 
leads to a decomposition of the space of variations from the mean instead of the space 
spanned by the functions themselves. In the statistical shape model 

- m I 

-Y, yk + Y, a ^ (23) 

k=l 0=1 

the principal components of variation ipj G L 2 (Q) from the mean are controlled by 
parameters aj G R (J = 1, . . . , I). We investigated proper orthogonal decompositions of 
both mean centered (y%, . . . , y m ) and plain datasets (j/i, . . . , y m ). In our test cases, mean 
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centering did not produce better results while increasing the complexity of algorithmic 
implementation, so we decided to use the plain formulation. 



In this section, we introduce the POD method for the continuous case and follow the 
discussion of |45] . Let $7 C M. d be bounded, L 2 (fi;IR) denotes the Hilbert space with 
U,9)tf(n) = In f( x )a(x)dx and ||/||| 2 ( n ) = f n f(x) 2 dx for all f,g G L 2 (Q). We assume 
the m functions yi G L 2 (Q) (i = 1, . . . ,m) to be linear independent and define y(x) := 
(yi(x), . . . , y m {x)) G W 11 . For the POD method, consider y m = span {yi, . . . , y m } C 
L 2 (Q) of dimension m. Let {ipi, ■ ■ ■ , ipm} be an orthonormal basis of y m . Clearly, when 
using the complete basis, every yi can be represented by a linear combination of its 
elements 



(24) 



But we search for an orthonormal set of functions {ipi, ■ ■ ■ , ipi} C y m of dimension 
I < m, which describes y m as good as possible, i.e. every yi is approximated by a linear 
combination of {ipi, ■ ■ ■ ,ipi}- This leads to the following optimization problem: 



m I 

t=l i=l 



S.t. 



(25) 



.0 



Expanding the norm in (25) and using (V'i>V'i)i^(n) = ^tf yi e lds an equivalent formula- 
tion: 



max > > (yi,ibi) 

S.t. (lpi,1pj) L 2 {U) = 



2 

L 2 (Q) 



(26) 



(i, j = 1, ... ,0 



The solution {^i, . . . , V'z} of (26) is called POD-basis of rank I. We now briefly explain 
how the POD-basis is determined. 



We define the operator C : L 2 (Q) — > y m , 

m 

Ctf> ■= ^2(^,yi)mn)Vi- ( 27 ) 

i=l 

Then for C exists a series of orthonormal eigenfunctions {V'ili^i and corresponding 
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nonnegative real eigenvalues {Aj}^ with 



Cifii = 
Ai > . . . > X m > 
Xi = (i > m), 



(28) 
(29) 
(30) 



see [23]. Furthermore, is an orthonormal basis for L 2 ($7) and spanj^i, . . . , -0 m } = 

y m , which implicates that {if)\, ... ,0m} is an orthonormal basis for 3^ m - Also, for any 



I < m {V'l 5 - • • is the unique solution of (25). The error can be expressed by the 
sum of the m — I remaining eigenvalues 



i=i 



i=l 



= E v 

L 2 (n) j=l+1 



(31) 



So the POD-basis is determined by the first I eigenfunctions of C which correspond 



to the / largest eigenvalues. A finite approach for solving the eigensystem (28) can be 
derived [35]: Because ipk £ y m (A: = 1, ...,/), we set 



^k = K k ^2viyi (k = 1, . . . ,/). 



(32) 



i=l 



Inserting (32) into the eigensystem (28) yields 



m / rn 



3=1 \i=l J j=l 

and exploiting the linear independence of {y\, . . . , y m } we conclude 

m 

^2(yi,yj) L 2(n) v i = X kVj (k = l,...,l; j = l,...,m). 



(33) 



(34) 



This is a discrete eigenvalue problem and with the covariance matrix C = from (f22[) 
we can write it as 

Cv k = \ k v k (k = l,...,l). (35) 

So for determining the POD-basis {ipi, ■ ■ ■ , ipi} of rank I one has to compute the I eigen- 
vectors v k E R m of C which correspond to the I largest eigenvalues A&. Setting K k = -4= 
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for normalization in (32), the POD-basis elements are 

^ m 

^ = -=^^ (k = l,...,l). (36) 



Xk i=i 



With notations Vi := [vf] ik 6 M mx/ and := diag(-s/Ai, . . . , v^) G M' xi we can also 
write 



5 Gappy POD in Hilbert spaces 

The gappy POD method was first introduced in |10j for reconstructing images of faces 
from incomplete data. A mask function was used which set the greyscale of every pixel 
to zero which was not part of the incomplete dataset. In [2], the gappy POD method 
was applied to CFD problems for the first time and a selection vector was used to set 
the missing flow solution vector's entries to zero. We now introduce a straightforward 
approach for applying this methodology to L 2 (Q), where the gappy data is given by 
function evaluations on a discrete subset of 0. But first we briefly recap how a "non 
gappy" function is approximated by the POD-basis. 

When approximating an arbitrary function cp £ L 2 (Q) (not necessarily <f> 6 y m ) with 
the POD-basis, the solution of the optimization problem 



1 

min - 

»«..,a<*>en 2 



l 2 



i=i 



(38) 
L 2 (Q) 



is sought. This is a linear least squares problem and using optimality conditions and 
exploiting (tpi,ipj) = $ij one derives 

af = ^i) LHn) (j = l,...,0- (39) 

The approximating function <j>(x) := ^Laj ^-(j;) is a projection of <p(x) onto the 
subspace y l := span{V>i, ... ,-0;} C y m C L 2 (f2). 

Now for the gappy POD method, suppose 4>(x) itself is unknown again and only a set 
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of n data pairs of evaluations 



{(^(^OKU, (40) 

Xi G O C M ', 0(xj) el, Xi ^ Xj (i / j), (i = 1, . . . ,n) 

is available. Based on the data, we want to reconstruct the unknown function G L 2 (S7). 
Assuming there is a strong relation between <f> and y m , we approximate (j) by a G y l . 
Similar to ( 38 ) , we pose an optimization problem 



1 n / 1 \ 2 

M ™% oE U>(x,)-E a f^(x,) , (41) 

which again is a linear least squares problem. Clearly, though the ipj are orthonormal 
in L 2 (Q), meaning (ipi,'4'j)L 2 (n) = <%> this is not the case anymore on a subset of Q. 
Particularly, Ylk=i i ) i{ x k)' t l } 3( x k) {i ^ j) is generally not equal to zero. Introducing the 
design matrix 

* := (x i )]" = / li=1 G M nxZ , n > I, rankf = /. (42) 
and ^ := (0(xi ) , . . . , 4>(x n )) T G M nxl , the solution of (41 ) is given by the linear equation 

^ T = ^ T ip. (43) 



Since every ipj (J = 1, . . . , I) is a linear combination of {yi, . . . , y m }, we now derive a 



framework for avoiding multiple redundant evaluations of yi(x) both while solving (43) 



and also for evaluating 4>(x) = X]j=i a j Yj( x )- Even if each yj(x) is a surrogate model, 
depending on its complexity and /, m and n, the evaluations needed can be a bottleneck 
in the model generation. We define 

Y := M*0&=1 e KnXm ( 44 ) 



and analogously to (37) we get 

^ = YVi^l 1 . (45) 



12 



Also, with (37) we get for the approximating function 



i 

i=i 
= ^(i) W 

= y(x) V^aW (46) 

:=a(a)eR mxl 

such that we have a closed form of <p(x) = y(x)a^ = YllLi a j Vj{ x ) as a linear combi- 
nation of {yi, . . .,y m }. 

Because the database elements yi, . . . , y m have been transformed by solving the align- 
ment problem ( |14[ ), the unknown function 4>(x) (respectively its known evaluations 
{(xi) <A(xj))}" =1 ) has to be allowed a transformation of the same class. At this point 
the task is not finding an overall alignment, but fitting the POD basis to a fixed set of 
data pairs {(x^, (f)(xi))}™ =1 . So rather than transforming cj>(x), we (equivalently) apply 
a transformation to the approximating function (f>(x), parametrized by p. The double 
bar indicates that we deal with a second transformation after the already known initial 
transformation of the database functions by alignment (parametrized by q): 

W(p) . = MM 1 (47) 

x^Hl + ql) +4) (1+Pl)+P2 N 
xW(l + qi)+qi) (l+p 3 )+PA / 

Vj @(p),v j ) ■= Vj (5(p)) (i + 4) + 9b> ( 49 ) 

(f(p),p,aW) := 4> (f(p),aW) +p 5 (50) 
= y(W{p),q)V l Z- 1 aM+p 5 . (51) 



(48) 



Here p does not contain a scaling parameter like q J in (13), because it would be linear 
dependent on a^K The linear least squares problem (41) is then augmented by the 
transformation parameters p£R 5 and again a penalty term is included: 

1 " / ~ \ 2 S 

aWeR^SR 5 2 ~^ V V ' / 2 
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This nonlinear least squares problem is solved by a Gaufi-Newton algorithm for a and 



p simultaneously. Unlike (14), this can be accomplished without computational issues. 



Intuitively, an initial value for the algorithm is p = (no transformation) and its corre- 



sponding linear least squares solution = (<I' T \I') \p (43). 



We call the approximation eft (x(p),p, o'") to the unknown function <j>(x) G L 2 (£l) 
based on the data pairs of evaluations {(xf, (f>(xi))}™ =1 and the function database {yi, . . . , y m } 
a generic surrogate model (GSM). It is a least squares approximation to the data pairs 
and not an interpolation like the Kriging surrogate model. We consider the information 
contained in the data points as extremely valuable, especially since the evaluations of 
4>{xi) are assumed computationally very expensive. So a surrogate model should be 
as accurate as possible particularly in the proximity of any Xi, which can be realized 
by interpolation rather than approximation. Therefore, the next section will introduce 
a Kriging type data fusion method to generate an interpolation model which uses the 
generic surrogate model as a global trend. 



6 Hierarchical Kriging 



Variable fidelity modeling (VFM) comprises methods for improving the approximation 
quality of interpolating only few computationally expensive evaluations (high fidelity) 
when having access to secondary data. This secondary data may consist of cheaper 
computations of a less accurate model (low fidelity) or a second variable which is as- 
sumed to be correlated with the primary variable. E.g. in CFD, computations with a 
Navier-Stokes code are regarded as high fidelity data, while computations of the same 
problem with an Euler code serve as low fidelity data. In this paper, we use the generic 
surrogate model 4> (51) as the low fidelity model to improve the interpolation quality 



of the (high fidelity) evaluations {(xf, 4>(xi))Yi = i- So far, two major VFM frameworks 
could be distinguished. Cokriging, originally developed in geostatistics, establishes a 
relation between primary and auxiliary variable by cross correlation [46J. Other VFM 
methods use an (additive, multiplicative or hybrid) bridge function, which corrects the 
discrepancy between a lo-fi and a hi-fi surrogate model [T7]. Recently, a new robust VFM 
method was introduced [16] , whose implementation and computational complexity does 
not exceed the common Kriging method. It is called hierarchical Kriging and the ansatz 
is a straightforward extension of section [2] 
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In the Kriging model ([!]) 

4>(x) = 0(j)(W(p),p,a) +z(x), x<EQcR d , 



(53) 



the regression term is replaced by the lo-fi model (in our case the generic surrogate model 
0) with /3 € M and z(x) capturing the lack of fit like in (2). Analogously to (3), the 
hierarchical Kriging predictor 

(j){x) = c(x) T f (54) 

is a weighted sum of <p = (4>(xi), . . . , </>(x n )) T . Again, the weights Cj(x) are determined 
by solving the linear equation 



R $ 
$ T 




r(x) 
Mx{p),P, a) 



(55) 



which minimizes the mean squared error subject to the unbiasedness constraint Q. R, 



pnxl 



replaces 



r(x), c(x) and fJ,(x) are the same as in (5), while = <p(xi(p),p,a) 

the linear regression design matrix and (j)(x(p),p,a) G M. contains the evaluation of 
the generic surrogate model in i. In the same manner we derive a closed form of the 
hierarchical Kriging predictor 



(x) = c(x) <p 



r{x) 
Mx{jp),p,a) 



R $ 

$ T 



-i 



(56) 



or equivalently with j3 = ($ T R 1 <I>) 1 <£ T i? V 



(x) = P4>(x(p),p, a) + r(x) T R 1 (tp - (3$) . 



(57) 



Formulation (56) is more suitable for the implementation, because j3 does not need to 



be computed explicitly and the solution of the linear system [ * ] 1 ( o ) oru y nas *° 
be computed once since it is independent of x. Also one can witness that, inserting any 
Xi (i = 1, ... ,n), the hierarchical Kriging predictor is an interpolator (4>{xi) = 4>{xi)). 



Formulation (57) demonstrates how the predictor works. Assuming that the generic 



surrogate model fit <p approximates the data 4>(x{), f3 will be close to 1. The second 
summand is a weighted sum of correlation functions which "pulls" the response towards 
the exact evaluations Mx;). 



Figure [T] outlines the generic surrogate modeling framework. We distinguish between 
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function database 




Figure 1: Generic surrogate modeling framework 

offline costs and online costs for model generation. The three steps on the left hand 
side, generation of accurate surrogate models for all database functions, solving the 
correspondence problem and computing the proper orthogonal decomposition, only have 
to be performed once so they are merely offline costs. When interpolating sample data 
for any new test case, evaluating the computer experiment for the samples, the gappy 
POD fit and building the hierarchical Kriging model are online costs. We assume that 
one single evaluation of the computer experiment 4 > { x i)i e -S- a CFD solution, dominates 
the computational cost of model generation. 

7 Numerical Results 

We choose the following test case for the validation of the methods of this paper. For 
two-dimensional airfoil geometries, we approximate the aerodynamic coefficients lift (q), 
drag (cd) and pitching moment (c m ), depending on the Mach number and the angle of 
attack a. The data is obtained by RANS computations with the DLR TAU-code [40] . 
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We consider a database of 24 airfoils: 23 airfoils from the NACA 4-digit series (first 
digit 6 [3.0,6.0], second digit £ [1.0,6.0] and the last two digits which determine the 
thickness are fixed at 12) and the RAE 2822, see figure[2j For each airfoil, we compute an 
accurate surrogate model for each problem class (q(Mo, a), Cd(Ma, a), c m (Ma, a)) in the 
reference domain Q, := [0.2,0.9] x [—4°, +12°]. Note that if alignment of the database is 
considered ( |14[ ) , the surrogates should be accurate in a domain larger than the reference 



domain (e.g. [0.1,1.0] x [—6°, +14°]) due to possible translations (12) in the alignment 



process, see figure [3j For each airfoil, 400 CFD solutions are computed on a 20 x 20 
tensorgrid which covers the reference domain. This sums up to 9600 CFD simulations for 
generating the database. Depending on the input parameters (Ma, a), one flow solution 
takes from 30 minutes to over 3 hours of CPU time. All computations can be performed 
independently from each other in parallel and using two AMD Opteron architectures 
with 48 2.3GHz cores each, the generation of the databases took approximately two 
weeks. 




a. 



Figure 2: Airfoil database, 9 examples from the NACA 4-digit series and the RAE 2882. 



c. transformations 



Op, transformations 



c., transformations 






Figure 3: Red: reference domain $7, blue: preimages of f2 for the transformations (12) 



For each of the three test cases, after solving the correspondence problem (14) the 



database contains the aligned surrogate models yi(x), . . . ,?/24(x). Figure [4] shows four 
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examples of lift responses. Mutual characteristics like the linear behavior in the left 
part, the local maximum in the upper right corner or the curvature which reaches from 
the lower right to the upper middle are aligned as good as possible by the admissible 
transformation. 





Figure 4: q database functions 



Subsequently, the proper orthogonal decomposition of y = span{yi, . . . ,2/24} is per- 
formed. The choice of the POD-basis' rank Z, i.e. the number of POD-basis elements, is 



carried out automatically. Using the approximation error formula (31), Z is chosen such 
that 



A " > 0.999. (58) 



In this way, we guarantee that the POD-basis {^l, • • • ,ipi} contains at least 99.9% of 
3^'s information. Table [T] shows the number of required POD-basis elements for the 
three test cases. Using an aligned database reduces the number by one in each case, 
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meaning that by aligning the characteristic features of the database functions the total 
variation is reduced. Figure [5] illustrates the rapid decay of the eigenvalues for the q test 



case, therefore already four POD-basis elements are sufficient for (58). The properties 
of ipi, . . . ,tp4 are depicted in figure |6j ipi is close to an average q response, 1^2 and ^3 
both essentially control the curvature from the lower right to the upper middle as well 
as the position of the local maximum in the upper right and V4 mostly influences the 
behavior in the lower right. 





Ci Cd Cm 


POD (aligned) 
POD (no alignment) 


4 4 5 

5 5 6 



Table 1: Rank of the POD-basis. 
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Figure 5: Distribution of the eigenvalues Aj for the c\ POD basis 



With the POD-basis, we are now able to perform hierarchical Kriging interpolation 



(56) based on the generic surrogate model (51) for new test cases, i.e. for responses of 
new airfoil geometries. We choose a NACA 3.375 2.875 1 2 profile, an airfoil which is 
not contained in the database, to demonstrate our surrogate modeling framework. The 
CFD solver is evaluated on a 40 x 40 discrete grid ri val C f2 to generate a set of validation 
data for the purpose of error evaluation. We use 



1 



m 



a\n val \ 



^2 00*0 ~ 00*0 



ceft val 



and rj 



— max \4>{x) 



4>(x) 



(59) 
(60) 
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the relative average error and the relative maximum error as measures of accuracy, where 
a denotes the standard deviation of the set {<P(x)} x& ^ vs i and |f2 val | = 1600. In a first 
study, the performance of ordinary Kriging interpolation is compared to hierarchical 
Kriging based on the generic surrogate model with and without the alignment (figure 
[7]). We compute 10 Latin hypercube samplings each for the sample sizes 5, 7, 10, 15, 20, 
30, 40 and 50 and use them to generate interpolations, for each of the three surrogate 
modeling techniques and for each of the three responses q, Cd, c m . For each sample 
size, an average performance of the 10 Latin hypercube samplings is computed. The 
figure shows clearly that both hierarchical Kriging methods (blue and red) outperform 
the ordinary Kriging method (green) in terms of average error as well as maximum error. 
Exceptions are the accuracy of the q and the c m approximations for the Latin hypercube 
sample sizes of 5 and 7, respectively. The reason is that for very small sample sizes, the 
gappy POD approximation (41 )/( 52 ) has too many degrees of freedom (number of basis 
elements plus optionally number of transformation parameters) compared to the number 
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of conditions (number of samples). This results in the few samples being approximated 
very closely by the linear combination of (aligned) POD-basis elements at the cost of an 
unfavorable behavior in the rest of the domain Q. One can further observe that using 
transformations in the generic surrogate model (red) decreases the approximation errors 
compared to hierarchical Kriging based on non aligned generic surrogate models (blue) 
for sample sizes > 20. So the effort for the solution of a nonlinear least squares gappy 
POD problem is justified. The computational costs of surrogate model generation are 
negligible compared to one single evaluation of the CFD solver: the maximum CPU 
time for a sample size of 50 was 40 seconds. Comparing the performances of both 
hierarchical Kriging methods depending on the sample size, one recognizes that with 
increasing number of samples the accuracy is not necessarily improved, especially for 40 
and 50 samples. 



Therefore, we also apply adaptive sampling strategies. Unlike Latin hypercube or 
Monte Carlo sampling strategies, the samples are generated sequentially. At every stage 
of the adaptive process, a surrogate model is generated and assessed in order to find 
a new sample location x* , in which the unknown function is evaluated. The data pair 
(x*, 4>(x*)) is subsequently added to the current sampling. We investigated two adaptive 
sampling strategies, which are easy to implement in the generic surrogate modeling 
framework. The first one chooses the new sample x* where the predicted mean squared 

error MSE 4>(x) := E (^>(x) — 4>{x)^j is highest [37]. This quantity can easily be 

computed by 



MSE 



<P(x) 




r(x) 





' R 




1 


) 










r(x) 
6(W(p),p, a) 



(61) 



it is equal to zero in every previously sampled location and grows with distance to them. 
The new sample location is determined by 



arg max MSE 



4>{x) 



(62) 



A second adaptive sampling strategy for variable fidelity modeling is proposed in (T7J . In 
the sequel, we distinguish between the GSM-based hierarchical Kriging ^gsm(^) := 4>{ x ) 
and an ordinary Kriging interpolation 0kri(^)- Assuming that both surrogate models 
will converge to the true function <fi{x) with growing (maybe very large) number of 
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Figure 7: Average performances for Latin hypercube samplings, 
samples, a new sample is added where the error between both models is highest: 



x = arg max 



^GSmO) - <t>KRl(x) 



(63) 
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Figure 8: Comparison of Latin hypercube samplings and adaptive sampling strategies. 

For the hierarchical Kriging based on aligned gappy POD fittings which performed best 
in the study above, we compare both adaptive sampling strategies to the average Latin 
hypercube performance (red) in figure [8| For both methods we use an initial sampling 
with 5 points. For the MSE method (cyan) no clear assessment is possible. For the 
Q response its performance is comparable to the Latin hypercube samplings up to 30 
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Figure 9: Distribution of samples. 



samples and outperforms them for larger sample sizes. The cy response is approximated 
more accurately by the Latin hypercube samplings than by the MSE method throughout 
almost all sample sizes, only for 40-50 samples the adaptive MSE is more accurate in 
terms of the average error 771 . For the c m response, its average error is comparable to 
the nonadaptive samplings' performance, while the maximum error can not be reduced 
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in the adaptive process and is higher than for Latin hypercubes. The second adaptive 
strategy (magenta) seems to produce more robust results. Despite a long starting phase 
for the Cd response, where it performs worse than the Latin hypercube samplings, at the 
latest for 45 samples it is more accurate, even earlier for q and. Cm. Figure [9] illustrates 
surrogate models and the distributions of sample points generated by the two adaptive 
methods. The 5 point initial sampling is depicted in black squares. The MSE method 
merely generates a space filling design. When the correlation lengths of the x 1 and the 
x 2 dimension differ, many samples are placed at the lower and upper boundary of the 
axis with the larger correlation length, an observation already described in a previous 
publication |37j . The second method adds more samples in regions where the response 
has larger gradients or curvature, e.g. high Mach number for all three responses or 
the already mentioned curvature of the q response. This behavior is desirable, since 
traditionally in these regions surrogate models are least accurate and need more samples 
to describe the characteristics of the true function <p{x). On the other hand, the lower 
and upper boundary of the a-axis obtain almost as many samples as in the MSE sampling 
strategy. These two attributes make this sampling strategy the most powerful in our 
test cases. We emphasize that in a previous study of similar test cases [37J, adaptive 
sampling strategies also required a starting phase and the benefit compared to Latin 
hypercube samplings was observed only after a total number of 40-60 samples. 



8 Conclusions 



In this paper a new approach for globally valid surrogate models was developed. We 
proposed a variable fidelity framework, which uses a generic surrogate model as a global 
trend. Generic surrogate models can be used, whenever surrogate models for multiple 
test cases of a predefined problem class are available. We addressed how to estab- 
lish correspondence between these database functions and how to identify characteristic 
structures by POD. The generic surrogate model was introduced as a gappy POD fit to 
the sample data and we extended the method to nonlinear transformations. Hierarchical 
Kriging, a recently developed VFM method, was used to interpolate sample data based 
on the generic surrogate model. In contrast to other VFM methods, where high-fidelity 
and low-fidelity data are both computed "online", i.e. for every new test case of a prob- 
lem class, in generic surrogate modeling the database functions are computed once and 
stored ( "offline" ) , such that for every new test case of the mutual problem class only the 
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high-fidelity data has to be computed online. We validated the methods in a test case 
of aerodynamic coefficients of two-dimensional airfoils depending on the input parame- 
ters (Ma, a). A database of surrogates was generated using RANS computations with 
DLR TAU for 24 airfoils. For a new airfoil not contained in the database interpolations 
were generated for Latin hypercube samplings. Hierarchical Kriging based on generic 
surrogate models performed more accurate than ordinary Kriging interpolations and the 
benefit was largest for sample sizes up to 30. We also showed that further improvement 
of the approximation quality is possible using adaptive sampling strategies. Requiring 
significantly less expensive samples to achieve a desired accuracy than ordinary Kriging, 
hierarchical Kriging based on generic surrogate models describes an efficient framework 
in surrogate modeling. 
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